The current work was carried out to study the data and interpret the result of the study that was conducted by Dalia Research in April 2016
The distribution of respondents by age among men and women is almost the same.
# DISTRIBUSION BY AGE + GENDER
ggplot(data = income_data, aes(x = age)) +
ggtitle('The distribution of respondents by age') +
geom_histogram(aes(y = ..density..)
,col = 'black'
,fill = 'white') +
geom_density(alpha = 0.2, fill='#FF6666') +
facet_grid(gender ~.)
Noticeably a slight shift towards men.
# DISTRIBUSION BY GENDER (SHARE)
ggplotly(
ggplot(data = income_data, aes(x = age, fill = gender)) +
ggtitle('Share distribution by male and female') +
labs(x = 'Age', y = 'Share of gender') +
geom_histogram(position = 'fill', alpha=0.7, binwidth = 1) +
scale_x_continuous(breaks=seq(10 , max(income_data[,age]), 5)) +
geom_hline(aes(yintercept = 0.5), colour="white")
)
## What about education?
I changed the name of the attributes in the field dem_education_level, to rank these parameters.
income_data[dem_education_level == 'no', dem_education_level:="no"]
income_data[dem_education_level == 'low', dem_education_level:="3. low"]
income_data[dem_education_level == 'medium', dem_education_level:="2. medium"]
income_data[dem_education_level == 'high', dem_education_level:="1. high"]
Now let’s see how our correspondents are distributed by age and level of education
ggplotly(
ggplot(data = income_data, aes(x = age, fill = dem_education_level)) +
ggtitle('Distribusion by age and level of education') +
labs(x = 'Number fo respondents', y = 'Share') +
geom_histogram(bins = 50
,alpha = 0.7)
)
Let’s look at the share of distribution. The majority of respondents have medium and higher education.
ggplotly(
ggplot(data = income_data, aes(x = age, fill = dem_education_level)) +
ggtitle('Share of education level by age group') +
labs(x = 'Age Group', y = 'Share') +
geom_histogram(bins = 50
,position = "fill"
,alpha = 0.7)
)
Let’s see how the share of the education level is distributed according to the age category of the respondents.
round(prop.table(table(income_data$dem_education_level, income_data$age_group),margin = 2),2)
##
## 14_25 26_39 40_65
## 1. high 0.25 0.46 0.35
## 2. medium 0.41 0.36 0.42
## 3. low 0.29 0.15 0.20
## no 0.05 0.03 0.03
prob_t <- data.table(round(prop.table(table(income_data$dem_education_level, income_data$age_group),margin = 2),2))
names(prob_t) <- c('dem_education_level', 'age_group', 'probability')
Now let’s display the shares on the bar-plot
ggplotly(
ggplot(data = prob_t, aes(x = age_group, y = probability, fill = dem_education_level)) +
ggtitle('Share of education level by age group') +
labs(x = 'Age Group', y = 'Share') +
geom_bar(stat = 'identity'
,alpha=0.7
,col = 'black')
)
To be continued…